89 research outputs found

    Bilingual distributed word representations from document-aligned comparable data

    Get PDF
    We propose a new model for learning bilingual word representations from non-parallel document-aligned data. Following the recent advances in word representation learning, our model learns dense real-valued word vectors, that is, bilingual word embeddings (BWEs). Unlike prior work on inducing BWEs which heavily relied on parallel sentence-aligned corpora and/or readily available translation resources such as dictionaries, the article reveals that BWEs may be learned solely on the basis of document-aligned comparable data without any additional lexical resources nor syntactic information. We present a comparison of our approach with previous state-of-the-art models for learning bilingual word representations from comparable data that rely on the framework of multilingual probabilistic topic modeling (MuPTM), as well as with distributional local context-counting models. We demonstrate the utility of the induced BWEs in two semantic tasks: (1) bilingual lexicon extraction, (2) suggesting word translations in context for polysemous words. Our simple yet effective BWE-based models significantly outperform the MuPTM-based and contextcounting representation models from comparable data as well as prior BWE-based models, and acquire the best reported results on both tasks for all three tested language pairs.This work was done while Ivan Vuli c was a postdoctoral researcher at Department of Computer Science, KU Leuven supported by the PDM Kort fellowship (PDMK/14/117). The work was also supported by the SCATE project (IWT-SBO 130041) and the ERC Consolidator Grant LEXICAL: Lexical Acquisition Across Languages (648909)

    Learning unsupervised multilingual word embeddings with incremental multilingual hubs

    Get PDF
    Recent research has discovered that a shared bilingual word embedding space can be induced by projecting monolingual word embedding spaces from two languages using a self-learning paradigm without any bilingual supervision. However, it has also been shown that for distant language pairs such fully unsupervised self-learning methods are unstable and often get stuck in poor local optima due to reduced isomorphism between starting monolingual spaces. In this work, we propose a new robust framework for learning unsupervised multilingual word embeddings that mitigates the instability issues. We learn a shared multilingual embedding space for a variable number of languages by incrementally adding new languages one by one to the current multilingual space. Through the gradual language addition our method can leverage the interdependencies between the new language and all other languages in the current multilingual hub/space. We find that it is beneficial to project more distant languages later in the iterative process. Our fully unsupervised multilingual embedding spaces yield results that are on par with the state-of-the-art methods in the bilingual lexicon induction (BLI) task, and simultaneously obtain state-of-the-art scores on two downstream tasks: multilingual document classification and multilingual dependency parsing, outperforming even supervised baselines. This finding also accentuates the need to establish evaluation protocols for cross-lingual word embeddings beyond the omnipresent intrinsic BLI task in future work

    Bilingual lexicon induction by learning to combine word-level and character-level representations

    Get PDF
    We study the problem of bilingual lexicon induction (BLI) in a setting where some translation resources are available, but unknown translations are sought for certain, possibly domain-specific terminology. We frame BLI as a classification problem for which we design a neural network based classification architecture composed of recurrent long short-term memory and deep feed forward networks. The results show that word- and character-level representations each improve state-of-the-art results for BLI, and the best results are obtained by exploiting the synergy between these word- and character-level representations in the classification model

    Do Meio- and Macrobenthic Nematodes Differ in Community Composition and Body Weight Trends with Depth?

    Get PDF
    Nematodes occur regularly in macrobenthic samples but are rarely identified from them and are thus considered exclusively a part of the meiobenthos. Our study compares the generic composition of nematode communities and their individual body weight trends with water depth in macrobenthic (>250/300 µm) samples from the deep Arctic (Canada Basin), Gulf of Mexico (GOM) and the Bermuda slope with meiobenthic samples (<45 µm) from GOM. The dry weight per individual (µg) of all macrobenthic nematodes combined showed an increasing trend with increasing water depth, while the dry weight per individual of the meiobenthic GOM nematodes showed a trend to decrease with increasing depth. Multivariate analyses showed that the macrobenthic nematode community in the GOM was more similar to the macrobenthic nematodes of the Canada Basin than to the GOM meiobenthic nematodes. In particular, the genera Enoploides, Crenopharynx, Micoletzkyia, Phanodermella were dominant in the macrobenthos and accounted for most of the difference. Relative abundance of non-selective deposit feeders (1B) significantly decreased with depth in macrobenthos but remained dominant in the meiobenthic community. The occurrence of a distinct assemblage of bigger nematodes of high dry weight per individual in the macrobenthos suggests the need to include nematodes in macrobenthic studies

    Contribution of Distinct Homeodomain DNA Binding Specificities to Drosophila Embryonic Mesodermal Cell-Specific Gene Expression Programs

    Get PDF
    Homeodomain (HD) proteins are a large family of evolutionarily conserved transcription factors (TFs) having diverse developmental functions, often acting within the same cell types, yet many members of this family paradoxically recognize similar DNA sequences. Thus, with multiple family members having the potential to recognize the same DNA sequences in cis-regulatory elements, it is difficult to ascertain the role of an individual HD or a subclass of HDs in mediating a particular developmental function. To investigate this problem, we focused our studies on the Drosophila embryonic mesoderm where HD TFs are required to establish not only segmental identities (such as the Hox TFs), but also tissue and cell fate specification and differentiation (such as the NK-2 HDs, Six HDs and identity HDs (I-HDs)). Here we utilized the complete spectrum of DNA binding specificities determined by protein binding microarrays (PBMs) for a diverse collection of HDs to modify the nucleotide sequences of numerous mesodermal enhancers to be recognized by either no or a single subclass of HDs, and subsequently assayed the consequences of these changes on enhancer function in transgenic reporter assays. These studies show that individual mesodermal enhancers receive separate transcriptional input from both I–HD and Hox subclasses of HDs. In addition, we demonstrate that enhancers regulating upstream components of the mesodermal regulatory network are targeted by the Six class of HDs. Finally, we establish the necessity of NK-2 HD binding sequences to activate gene expression in multiple mesodermal tissues, supporting a potential role for the NK-2 HD TF Tinman (Tin) as a pioneer factor that cooperates with other factors to regulate cell-specific gene expression programs. Collectively, these results underscore the critical role played by HDs of multiple subclasses in inducing the unique genetic programs of individual mesodermal cells, and in coordinating the gene regulatory networks directing mesoderm development.National Institutes of Health (U.S.) (Grant R01 HG005287

    Optimal deployment of components of cloud-hosted application for guaranteeing multitenancy isolation

    Get PDF
    One of the challenges of deploying multitenant cloud-hosted services that are designed to use (or be integrated with) several components is how to implement the required degree of isolation between the components when there is a change in the workload. Achieving the highest degree of isolation implies deploying a component exclusively for one tenant; which leads to high resource consumption and running cost per component. A low degree of isolation allows sharing of resources which could possibly reduce cost, but with known limitations of performance and security interference. This paper presents a model-based algorithm together with four variants of a metaheuristic that can be used with it, to provide near-optimal solutions for deploying components of a cloud-hosted application in a way that guarantees multitenancy isolation. When the workload changes, the model based algorithm solves an open multiclass QN model to determine the average number of requests that can access the components and then uses a metaheuristic to provide near-optimal solutions for deploying the components. Performance evaluation showed that the obtained solutions had low variability and percent deviation when compared to the reference/optimal solution. We also provide recommendations and best practice guidelines for deploying components in a way that guarantees the required degree of isolation

    Interstitial lung disease in children - genetic background and associated phenotypes

    Get PDF
    Interstitial lung disease in children represents a group of rare chronic respiratory disorders. There is growing evidence that mutations in the surfactant protein C gene play a role in the pathogenesis of certain forms of pediatric interstitial lung disease. Recently, mutations in the ABCA3 transporter were found as an underlying cause of fatal respiratory failure in neonates without surfactant protein B deficiency. Especially in familiar cases or in children of consanguineous parents, genetic diagnosis provides an useful tool to identify the underlying etiology of interstitial lung disease. The aim of this review is to summarize and to describe in detail the clinical features of hereditary interstitial lung disease in children. The knowledge of gene variants and associated phenotypes is crucial to identify relevant patients in clinical practice

    Radiation chemistry of solid-state carbohydrates using EMR

    Get PDF
    We review our research of the past decade towards identification of radiation-induced radicals in solid state sugars and sugar phosphates. Detailed models of the radical structures are obtained by combining EPR and ENDOR experiments with DFT calculations of g and proton HF tensors, with agreement in their anisotropy serving as most important criterion. Symmetry-related and Schonland ambiguities, which may hamper such identification, are reviewed. Thermally induced transformations of initial radiation damage into more stable radicals can also be monitored in the EPR (and ENDOR) experiments and in principle provide information on stable radical formation mechanisms. Thermal annealing experi-ments reveal, however, that radical recombination and/or diamagnetic radiation damage is also quite important. Analysis strategies are illustrated with research on sucrose. Results on dipotassium glucose-1-phosphate and trehalose dihydrate, fructose and sorbose are also briefly discussed. Our study demonstrates that radiation damage is strongly regio-selective and that certain general principles govern the stable radical formation
    • …
    corecore